g03adf

g03adf © Numerical Algorithms Group, 2002.

Purpose

G03ADF Performs canonical correlation analysis

Synopsis

[e,ncv,cvx,cvy,ifail] = g03adf(z,isz<,wt,tol,weight,ifail>)

Description

 
 Let there be two sets of variables, x and y. For a sample of n 
 observations on n  variables in a data matrix X and n  variables 
                  x                                   y          
 in a data matrix Y, canonical correlation analysis seeks to find 
 a small number of linear combinations of each set of variables in
 order to explain or summarise the relationships between them. The
 variables thus formed are known as canonical variates.
 
 Let the variance-covariance of the two data sets be
 
                             (S   S  )
                             ( xx  xy)
                             (S   S  )
                             ( yx  yy)
 
 and let
 
                                -1    -1 
                       (Sigma)=S  S  S  S 
                                yy yx xx xy
 
 then the canonical correlations can be calculated from the 
 eigenvalues of the matrix (Sigma). However, G03ADF calculates the
 canonical correlations by means of a singular value decomposition
 (SVD) of a matrix V. If the rank of the data matrix X is k  and 
                                                           x    
 the rank of the data matrix Y is k  and both X and Y have had 
                                   y                          
 variable (column) means subtracted then the k  by k  matrix V is 
                                              x     y            
 given by:
 
                                 T  
                              V=Q Q ,
                                 x y
 
 where Q  is the first k  rows of the orthogonal matrix Q either 
        x               x                                       
 from the QR decompostion of X if X is of full column rank, i.e., 
 k =n :
  x  x
 
                              X=Q R 
                                 x x
 
 or from the SVD of X if k <n :
                          x  x
 
                                    T
                             X=Q D P 
                                x x x
 
 Similarly Q  is the first k  rows of the orthogonal matrix Q 
            y               y                                
 either from the QR decompostion of Y if Y is of full column rank,
 i.e., k =n :
        y  y
 
                              Y=Q R 
                                 y y
 
 or from the SVD of Y if k <n :
                          y  y
 
                                    T
                             Y=Q D P .
                                y y y
 
 Let the SVD of V be:
 
                                       T
                           V=U (Delta)U 
                              x        y
 
 then the non-zero elements of the diagonal matrix (Delta), 
 (delta) , for i=1,2,...,l, are the l canonical correlations 
        i                                                   
 associated with the l canonical variates, where l=min(k ,k ).
                                                        x  y 
 
                          2                                  
 The eigenvalues, (lambda) , of the matrix (Sigma) are given by:
                          i                                  
 
                                         2 
                                  (delta)  
                              2          i 
                      (lambda) = ----------.
                              i           2
                                 1+(delta) 
                                          i
 
                            2  --        2                      
 The value of (pi) =(lambda) / > (lambda)  gives the proportion of
                  i         i  --        i                      
 variation explained by the ith canonical variate. The values of 
 the (pi) 's give an indication as to how many canonical variates 
         i                                                       
 are needed to adequately describe the data, i.e., the 
 dimensionality of the problem.
 
 To test for a significant dimensionality greater than i the 
      2 
 (chi)  statistic:
 
                              p                   
                    1         --                 2
                (n- -(k +k )) >    log(1+(lambda) )
                    2  x  y   --                 i
                              j=i+1               
 
                                                           2
 can be used. This is asymptotically distributed as a (chi)  
 distribution with (k -i)(k -i) degrees of freedom. If the test 
                     x     y                                   
 for i=k    is not significant, then the remaining tests for 
        min                                                 
 i>k    should be ignored.
    min           
 
 The loadings for the canonical variates are calculated from the 
 matrices U  and U  respectively. These matrices are scaled so 
           x      y                                           
 that the canonical variates have unit variance.

Parameters

g03adf

Required Input Arguments:

z (:,:)                               real
isz (:)                               integer

Optional Input Arguments:                       <Default>

wt (:)                                real     zeros(size(z,1),1)
tol                                   real     sqrt(eps)
weight (1)                            string   'u'
ifail                                 integer  -1

Output Arguments:

e (:,6)                               real
ncv                                   integer
cvx (:,:)                             real
cvy (:,:)                             real
ifail                                 integer